Note: This analysis presents results of the 2021 Stack Overflow Developer Survey focusing on the feedback of 8193 US-based respondents. As this analysis will set its focus on questions around remuneration, any outliers above the 1.5 IQR boundary are removed. Generally, only complete survey feedbacks are considered.
The Majority of the US-based respondents live in California (988) followed by Texas (575), New York (488) and Washington (487).
Three out of four survey participants are between 25 and 44 years old.
With regards to gender the interviewed population shows a clear imbalance and as female particpants are only represented with 6.35 %.
The vast majority of participants are professional developers.
The largest share considers themselve as Full-Stack Developers, while the specific job roles are manifold.
With regards to employment status about 95% declared to work for an employer while freelancers and independent workers are a minority.
80% of the respondents at least hold a bachelor’s or any higher degree.
School, books, and online resources are named as most frequent ways to learn coding.
More than half of all respondents claim to have written their first line during their adolescence.
Most of the participants find the answers to their problems through Google and Stack Overflow. Just as frequently they would suggest to do a break and come back to the problem with a fresh mind.
Almost every second participant visits Stack Overflow on a daily basis.
To gain a deeper understanding about the distribution of the salaries and to prove the Central Limit Theorem, random sampling is applied to generate samples of varying sizes 20,30,40 and 50. The central limit theorem states that if you have a population with mean μ and standard deviation σ and take sufficiently large random samples from the population with replacement, then the distribution of the sample means will be approximately normally distributed (Source: Wayne W. LaMorte - Boston University School of Public Health). The sample means are computed 10,000 times. As illustrated below, with an increasing sample size, the standard deviation shrinks.
## [1] "Sample Size: 20, Mean: 128031.970000, Standard Deviation, 11329.660000"
## [2] "Sample Size: 30, Mean: 128032.480000, Standard Deviation, 9341.340000"
## [3] "Sample Size: 40, Mean: 128171.880000, Standard Deviation, 8135.990000"
## [4] "Sample Size: 50, Mean: 128107.030000, Standard Deviation, 7227.590000"
Sampling is utilized when we want to determine any patterns that can be observed within a subset of the whole data. We have decided to sample our data based on the attribute ‘US_state’ and the value used in our distribution as ‘CONVERTERCOMPYEARLY’. When we look at and compare the four different types of distributions (SRS without replacement, Systematic sampling, Inclusion probabilities, and Stratified sampling) to the population dataset as a whole.
We can see that systematic sampling, and stratified sampling generally has the same min value as the population dataset with SRS without replacement having a slightly higher min value and Inclusion probabilities having a much higher min value. All sampling has a higher q1, mean, q3, and max value compared to the population dataset, with inclusion probabilities having the highest out of the four. Comparing all four of these sampling techniques, systematic sampling is the most similar to the population dataset and hence would be the most ideal type of sampling technique used.
## [1] 811.1
## Stratum 1
##
## Population total and number of selected units: 968 276.5568
## Stratum 2
##
## Population total and number of selected units: 352 100.5661
## Stratum 3
##
## Population total and number of selected units: 475 135.7071
## Stratum 4
##
## Population total and number of selected units: 571 163.1342
## Stratum 5
##
## Population total and number of selected units: 473 135.1357
## Number of strata 5
## Total number of selected units 811.1
Age and salary show a moderate correlation with a Perason Coefficient of 0.3.It also appears that maximum pay limit can also be reached during an early career stage.
Pearson Correlation:
## [1] 0.3179661
Although salaries are sparse in every education level, the boxplot illustration suggests that survey participants holding a Master’s or Doctoral degree earn on average higher salaries (135k) whereas there is only a slight difference between these two groups.
The size of the employer seem to play a crucial role as larger organizations are able to pay higher salaries compared to their smaller competitors.
The density graphs below provide evidence that the female survey participants earn less compared to their male colleagues. This reflects the findings by the U.S. Bureau of Labor Statistics (source: bls.gov/cps/earnings.htm).
When comparing men and women with regards to their educational level, the bars below suggest that women more often hold a Master’s or Doctoral degree.
The density distribution below suggests that the male survey participants have on average more years of professoinal experience, whereas women are stronger represented among the younger age classes.
Source: The data has been retrieved from the Google Trends API using the gtrendsR package.